[DO NOT MERGE] [Klaud Cold] experimental: MiniMax-M3 MI325X conc 4/8 — apply vllm#45639 (AITER AR + Gemma-RMS fusion)#1772
Conversation
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… applied MI325X (gfx942) counterpart of #1770. Validate vllm-project/vllm#45639 ("[ROCm][M3] Enable AITER AR + Gemma-RMS fusion for MiniMax-M3") on MI325X before an image rebuild, by applying the PR diff in-place to the shipped minimax-m3 image. - patches/vllm-45639-aiter-ar-gemma-rms.diff: vendored PR diff. - minimaxm3arf_fp8_mi325x.sh: applies the diff (idempotent; HARD-FAILS if it neither applies nor is already applied), serves with VLLM_ROCM_USE_AITER=1 + --compilation-config (custom_ops -minimax_gemma_rms_norm, pass_config.fuse_allreduce_rms). BF16 KV (gfx942). Includes a PROFILE=1 --profiler-config gate so the same recipe serves the companion profiling PR. - amd-master.yaml minimaxm3arf-fp8-mi325x-vllm: model-prefix minimaxm3arf routes to the new recipe; conc 4 and 8, TP8. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
aee12aa to
65be443
Compare
| echo "[vllm#45639] already applied to $VLLM_SP/vllm" | ||
| elif ( cd "$VLLM_SP" && patch -p1 --dry-run < "$PATCH_FILE" >/dev/null 2>&1 ); then | ||
| ( cd "$VLLM_SP" && patch -p1 < "$PATCH_FILE" ) | ||
| echo "[vllm#45639] applied to $VLLM_SP/vllm" |
There was a problem hiding this comment.
Patch apply errors ignored
High Severity
After a successful patch --dry-run, the script runs patch -p1 without checking its exit status and always prints that vllm#45639 was applied. A failed apply still starts vllm serve, so the job can finish while benchmarking an image that never received the fusion patch.
Reviewed by Cursor Bugbot for commit 65be443. Configure here.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27524660670 |
…ity w/ #1770) Mirror the #1770 updates onto the MI325X #45639 sweep: - VLLM_LOGGING_LEVEL=DEBUG + a post-server-ready grep of the server log into the job log, printing the AITER AR+RMS fusion verdict (registration bail warnings => 0 patterns; "Replaced N patterns" / "fusion pass matches" => match count). - 8k1k conc-16 TP8 row so mark_eval_entries marks an lm-eval (validate #45639 fused-kernel correctness on gfx942); perf points stay at conc 4/8. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit eb422fe. Configure here.
| - isl: 8192 | ||
| osl: 1024 | ||
| search-space: | ||
| - { tp: 8, conc-list: [ 16 ] } |
There was a problem hiding this comment.
conc-list breaks full-sweep
Medium Severity
The new single-node fixed-seq-len rows use only conc-list, but generate_full_sweep reads conc-start and conc-end for single-node entries and will raise KeyError when it hits this config during an unfiltered amd-master full sweep.
Reviewed by Cursor Bugbot for commit eb422fe. Configure here.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27525819313 |


What
MI325X (gfx942) counterpart of #1770. Benchmarks MiniMax-M3 MXFP8 on MI325X, conc 4 and 8 (TP8), with vllm-project/vllm#45639 (AITER fused all-reduce + Gemma-RMSNorm) applied in-place to the shipped
vllm/vllm-openai-rocm:minimax-m3image.How
benchmarks/single_node/fixed_seq_len/minimaxm3arf_fp8_mi325x.sh:#45639diff withpatch -p1(installspatchvia apt if missing): idempotent (proceeds if already applied), and hard-fails (exit 1) if it neither applies cleanly nor is already applied (image drifted fromm3_release).VLLM_ROCM_USE_AITER=1,--compilation-config '{"custom_ops": ["-minimax_gemma_rms_norm"], "pass_config": {"fuse_allreduce_rms": true}}'. BF16 KV (gfx942 has no calibrated FP8 attention scales).minimaxm3_fp8_mi325x.sh; carries aPROFILE=1--profiler-configgate so the companion profiling PR reuses the recipe.amd-master.yamlminimaxm3arf-fp8-mi325x-vllm— distinctmodel-prefix(minimaxm3arf) routes to the new recipe; conc 4 & 8, TP8. Prod recipe/config untouched.Validation
bash -n✓; YAML parses ✓;test-config→ 2 jobs (TP8, conc 4 + 8, mi325x,minimaxm3arf_1k1k) ✓.🤖 Generated with Claude Code
Note
Low Risk
Benchmark-only experimental config and recipe; no changes to production MiniMax-M3 paths. Runtime in-container patching is isolated to marked experimental jobs.
Overview
Adds an experimental, do-not-merge MI325X smoke path to validate vllm#45639 (AITER fused all-reduce + Gemma-RMSNorm for MiniMax-M3) on real gfx942 hardware before the upstream change ships in a rebuilt image.
A new
minimaxm3arf-fp8-mi325x-vllmentry inamd-master.yamluses model-prefixminimaxm3arfso jobs runminimaxm3arf_fp8_mi325x.shinstead of the production MI325X MiniMax-M3 recipe. The sweep is narrow: TP8 at conc 4/8 for 1k1k perf, plus an 8k1k conc-16 row so lm-eval can run under the existing eval policy.The recipe vendors and applies the #45639 diff to the installed vLLM inside the
minimax-m3container (idempotent apply, hard-fail if the image no longer matches the patch base). Serving then turns onVLLM_ROCM_USE_AITER=1,fuse_allreduce_rms, and disables theminimax_gemma_rms_normcustom op so the fusion pass can match IR. DEBUG logging plus a post-startup grep “fusion-pass verdict” block records whether the pass registered and replaced patterns.perf-changelog.yamldocuments the new config key. Productionminimaxm3*configs are unchanged.Reviewed by Cursor Bugbot for commit eb422fe. Bugbot is set up for automated code reviews on this repo. Configure here.